Load some data

1
import hotstepper as hs
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
2
df_data = pd.read_excel('https://raw.githubusercontent.com/TangleSpace/hotstepper-data/master/data/superstore.xls',
                            parse_dates=['Order Date','Ship Date'])
df_data.head()

2
Row ID Order ID Order Date Ship Date Ship Mode Customer ID Customer Name Segment Country City ... Postal Code Region Product ID Category Sub-Category Product Name Sales Quantity Discount Profit
0 1 CA-2016-152156 2016-11-08 2016-11-11 Second Class CG-12520 Claire Gute Consumer United States Henderson ... 42420 South FUR-BO-10001798 Furniture Bookcases Bush Somerset Collection Bookcase 261.9600 2 0.00 41.9136
1 2 CA-2016-152156 2016-11-08 2016-11-11 Second Class CG-12520 Claire Gute Consumer United States Henderson ... 42420 South FUR-CH-10000454 Furniture Chairs Hon Deluxe Fabric Upholstered Stacking Chairs,... 731.9400 3 0.00 219.5820
2 3 CA-2016-138688 2016-06-12 2016-06-16 Second Class DV-13045 Darrin Van Huff Corporate United States Los Angeles ... 90036 West OFF-LA-10000240 Office Supplies Labels Self-Adhesive Address Labels for Typewriters b... 14.6200 2 0.00 6.8714
3 4 US-2015-108966 2015-10-11 2015-10-18 Standard Class SO-20335 Sean O'Donnell Consumer United States Fort Lauderdale ... 33311 South FUR-TA-10000577 Furniture Tables Bretford CR4500 Series Slim Rectangular Table 957.5775 5 0.45 -383.0310
4 5 US-2015-108966 2015-10-11 2015-10-18 Standard Class SO-20335 Sean O'Donnell Consumer United States Fort Lauderdale ... 33311 South OFF-ST-10000760 Office Supplies Storage Eldon Fold 'N Roll Cart System 22.3680 2 0.20 2.5164

5 rows × 21 columns

All we need as a start is to usually have the data in a Pandas Dataframe. Fun fact, if you set the field type of a datetime field within the dataframe, either with the parse_dates option when reading a file or explicitly using apply pd.to_datetime directly on the field, when you tell HotStepper to use that field as either a start or end step key, Hotstepper will automatically detect the type and set the use_datetime=True flag for you.

For example,

3
sales_orders = hs.Steps.read_dataframe(
        df_data, #the dataframe we will read from
        start='Order Date', #field name of the start keys
        end='Ship Date', #field name of the end keys
        weight='Profit') #field name of the weight or value of each step

sales_orders.using_datetime()
3
True
4
ax = sales_orders.plot()
sales_orders.describe()
4
Metric Value
0 Count 1434
1 Mean 761.85
2 Median 513.1
3 Mode -6927.35
4 Std 1363.45
5 Var 1859001.89
6 Min -6927.35
7 25% 140.24
8 75% 1195.82
9 Max 11642.96
10 Area 26750137.57
11 Start 2014-01-03 00:00:00
12 End 2018-01-05 00:00:00
../_images/notebooks_load_data_5_1.svg

We can also load data into a Steps object a few more ways, let’s go with a raw example and use the read_array method.

5
sales_orders_arr = hs.Steps.read_array(
        start=df_data['Order Date'], #array of values for the start keys
        end=df_data['Ship Date'],    #array of values for the end keys
        weight=df_data['Profit'])    #array of values for the step weights or values

ax = sales_orders_arr.plot()
ax.axhline(0,color='r');
../_images/notebooks_load_data_7_0.svg